{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Grouping and Pivoting\n",
    "<hr style=\"height:0.6px;border:none;color:#666;background-color:#666;\" />\n",
    "\n",
    "In this section, we will answer the question:\n",
    "\n",
    "**What were the most popular male and female names in each year?**\n",
    "\n",
    "Here's the Baby Names dataset once again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mary</td>\n",
       "      <td>F</td>\n",
       "      <td>9217</td>\n",
       "      <td>1884</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Anna</td>\n",
       "      <td>F</td>\n",
       "      <td>3860</td>\n",
       "      <td>1884</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Emma</td>\n",
       "      <td>F</td>\n",
       "      <td>2587</td>\n",
       "      <td>1884</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Elizabeth</td>\n",
       "      <td>F</td>\n",
       "      <td>2549</td>\n",
       "      <td>1884</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Minnie</td>\n",
       "      <td>F</td>\n",
       "      <td>2243</td>\n",
       "      <td>1884</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Name Sex  Count  Year\n",
       "0       Mary   F   9217  1884\n",
       "1       Anna   F   3860  1884\n",
       "2       Emma   F   2587  1884\n",
       "3  Elizabeth   F   2549  1884\n",
       "4     Minnie   F   2243  1884"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "baby = pd.read_csv('babynames.csv')\n",
    "baby.head()\n",
    "# the .head() method outputs the first five rows of the DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Breaking the Problem Down**\n",
    "\n",
    "We decompose this problem into simpler table manipulations.\n",
    "\n",
    "1. Group the `baby` DataFrame by 'Year' and 'Sex'.\n",
    "2. For each group, compute the most popular name.\n",
    "\n",
    "Recognizing which operation is needed for each problem is sometimes tricky. Usually, a convoluted series of steps will signal to you that there might be a simpler way to express what you want. If we didn't immediately recognize that we needed to group, for example, we might write steps like the following:\n",
    "\n",
    "1. Loop through each unique year.\n",
    "2. For each year, loop through each unique sex.\n",
    "3. For each unique year and sex, find the most common name.\n",
    "\n",
    "```{note}\n",
    "There is almost always a better alternative to looping over a `pandas` DataFrame. In particular, looping over unique values of a DataFrame should usually be replaced with a group.\n",
    "```\n",
    "\n",
    "## Grouping\n",
    "<hr>\n",
    "\n",
    "To group in `pandas`. we use the `.groupby()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7ff990dd36d0>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "baby.groupby('Year')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.groupby()` returns a strange-looking `DataFrameGroupBy` object. We can call `.agg()` on this object with an aggregation function in order to get a familiar output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1880</th>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1881</th>\n",
       "      <td>1935</td>\n",
       "      <td>1935</td>\n",
       "      <td>1935</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <td>2127</td>\n",
       "      <td>2127</td>\n",
       "      <td>2127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1883</th>\n",
       "      <td>2084</td>\n",
       "      <td>2084</td>\n",
       "      <td>2084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1884</th>\n",
       "      <td>2297</td>\n",
       "      <td>2297</td>\n",
       "      <td>2297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>33715</td>\n",
       "      <td>33715</td>\n",
       "      <td>33715</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>33253</td>\n",
       "      <td>33253</td>\n",
       "      <td>33253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>33206</td>\n",
       "      <td>33206</td>\n",
       "      <td>33206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>33063</td>\n",
       "      <td>33063</td>\n",
       "      <td>33063</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>32868</td>\n",
       "      <td>32868</td>\n",
       "      <td>32868</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>137 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Name    Sex  Count\n",
       "Year                     \n",
       "1880   2000   2000   2000\n",
       "1881   1935   1935   1935\n",
       "1882   2127   2127   2127\n",
       "1883   2084   2084   2084\n",
       "1884   2297   2297   2297\n",
       "...     ...    ...    ...\n",
       "2012  33715  33715  33715\n",
       "2013  33253  33253  33253\n",
       "2014  33206  33206  33206\n",
       "2015  33063  33063  33063\n",
       "2016  32868  32868  32868\n",
       "\n",
       "[137 rows x 3 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The aggregation function takes in a series of values for each group\n",
    "# and outputs a single value\n",
    "def length(series):\n",
    "    return len(series)\n",
    "\n",
    "# Count up number of values for each year. This is equivalent to\n",
    "# counting the number of rows where each year appears.\n",
    "baby.groupby('Year').agg(length)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might notice that the `length` function simply calls the `len` function, so we can simplify the code above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1880</th>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1881</th>\n",
       "      <td>1935</td>\n",
       "      <td>1935</td>\n",
       "      <td>1935</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <td>2127</td>\n",
       "      <td>2127</td>\n",
       "      <td>2127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1883</th>\n",
       "      <td>2084</td>\n",
       "      <td>2084</td>\n",
       "      <td>2084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1884</th>\n",
       "      <td>2297</td>\n",
       "      <td>2297</td>\n",
       "      <td>2297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>33715</td>\n",
       "      <td>33715</td>\n",
       "      <td>33715</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>33253</td>\n",
       "      <td>33253</td>\n",
       "      <td>33253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>33206</td>\n",
       "      <td>33206</td>\n",
       "      <td>33206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>33063</td>\n",
       "      <td>33063</td>\n",
       "      <td>33063</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>32868</td>\n",
       "      <td>32868</td>\n",
       "      <td>32868</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>137 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Name    Sex  Count\n",
       "Year                     \n",
       "1880   2000   2000   2000\n",
       "1881   1935   1935   1935\n",
       "1882   2127   2127   2127\n",
       "1883   2084   2084   2084\n",
       "1884   2297   2297   2297\n",
       "...     ...    ...    ...\n",
       "2012  33715  33715  33715\n",
       "2013  33253  33253  33253\n",
       "2014  33206  33206  33206\n",
       "2015  33063  33063  33063\n",
       "2016  32868  32868  32868\n",
       "\n",
       "[137 rows x 3 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "baby.groupby('Year').agg(len)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A further shorthand to accomplish the same result would be by using .count() method as our aggregating function. Pandas has shorthands for common aggregation functions, including `count`,` sum`, and `mean`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1880</th>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1881</th>\n",
       "      <td>1935</td>\n",
       "      <td>1935</td>\n",
       "      <td>1935</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <td>2127</td>\n",
       "      <td>2127</td>\n",
       "      <td>2127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1883</th>\n",
       "      <td>2084</td>\n",
       "      <td>2084</td>\n",
       "      <td>2084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1884</th>\n",
       "      <td>2297</td>\n",
       "      <td>2297</td>\n",
       "      <td>2297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>33715</td>\n",
       "      <td>33715</td>\n",
       "      <td>33715</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>33253</td>\n",
       "      <td>33253</td>\n",
       "      <td>33253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>33206</td>\n",
       "      <td>33206</td>\n",
       "      <td>33206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>33063</td>\n",
       "      <td>33063</td>\n",
       "      <td>33063</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>32868</td>\n",
       "      <td>32868</td>\n",
       "      <td>32868</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>137 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       Name    Sex  Count\n",
       "Year                     \n",
       "1880   2000   2000   2000\n",
       "1881   1935   1935   1935\n",
       "1882   2127   2127   2127\n",
       "1883   2084   2084   2084\n",
       "1884   2297   2297   2297\n",
       "...     ...    ...    ...\n",
       "2012  33715  33715  33715\n",
       "2013  33253  33253  33253\n",
       "2014  33206  33206  33206\n",
       "2015  33063  33063  33063\n",
       "2016  32868  32868  32868\n",
       "\n",
       "[137 rows x 3 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "baby.groupby('Year').count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The aggregation is applied to each column of the DataFrame, producing redundant information. We can restrict the output columns by slicing before grouping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1880</th>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1881</th>\n",
       "      <td>1935</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <td>2127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1883</th>\n",
       "      <td>2084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1884</th>\n",
       "      <td>2297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>33715</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>33253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>33206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>33063</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>32868</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>137 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      Count\n",
       "Year       \n",
       "1880   2000\n",
       "1881   1935\n",
       "1882   2127\n",
       "1883   2084\n",
       "1884   2297\n",
       "...     ...\n",
       "2012  33715\n",
       "2013  33253\n",
       "2014  33206\n",
       "2015  33063\n",
       "2016  32868\n",
       "\n",
       "[137 rows x 1 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "year_rows = baby[['Year', 'Count']].groupby('Year').agg(len)\n",
    "year_rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the index of the resulting DataFrame now contains the unique years, so we can slice subsets of years using `.loc` as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1880</th>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1900</th>\n",
       "      <td>3730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1920</th>\n",
       "      <td>10755</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1940</th>\n",
       "      <td>8961</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1960</th>\n",
       "      <td>11924</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1980</th>\n",
       "      <td>19440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>29764</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Count\n",
       "Year       \n",
       "1880   2000\n",
       "1900   3730\n",
       "1920  10755\n",
       "1940   8961\n",
       "1960  11924\n",
       "1980  19440\n",
       "2000  29764"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Every twentieth year starting at 1880\n",
    "year_rows.loc[1880:2016:20, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Grouping on Multiple Columns\n",
    "<hr>\n",
    "\n",
    "We can group on multiple columns to get groups based on unique pairs of values. To do this, pass in a list of column labels into `.groupby()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1880</th>\n",
       "      <th>F</th>\n",
       "      <td>90992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>110491</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1881</th>\n",
       "      <th>F</th>\n",
       "      <td>91953</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>100743</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <th>F</th>\n",
       "      <td>107847</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <th>M</th>\n",
       "      <td>1913434</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2015</th>\n",
       "      <th>F</th>\n",
       "      <td>1776538</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>1907211</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2016</th>\n",
       "      <th>F</th>\n",
       "      <td>1756647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>1880674</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>274 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Count\n",
       "Year Sex         \n",
       "1880 F      90992\n",
       "     M     110491\n",
       "1881 F      91953\n",
       "     M     100743\n",
       "1882 F     107847\n",
       "...           ...\n",
       "2014 M    1913434\n",
       "2015 F    1776538\n",
       "     M    1907211\n",
       "2016 F    1756647\n",
       "     M    1880674\n",
       "\n",
       "[274 rows x 1 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grouped_counts = baby.groupby(['Year', 'Sex']).sum()\n",
    "grouped_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code above computes the total number of babies born for each year and sex. We can now use grouping by multiple columns to compute the most popular names for each year and sex. Since the data are already sorted in descending order of Count for each year and sex, we can define an aggregation function that returns the first value in each series. (If the data weren't sorted, we can call `sort_values()` first.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1880</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>7065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>John</td>\n",
       "      <td>9655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1881</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>6919</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>John</td>\n",
       "      <td>8769</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>8148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <th>M</th>\n",
       "      <td>Noah</td>\n",
       "      <td>19263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2015</th>\n",
       "      <th>F</th>\n",
       "      <td>Emma</td>\n",
       "      <td>20415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>Noah</td>\n",
       "      <td>19594</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2016</th>\n",
       "      <th>F</th>\n",
       "      <td>Emma</td>\n",
       "      <td>19414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>Noah</td>\n",
       "      <td>19015</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>274 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          Name  Count\n",
       "Year Sex             \n",
       "1880 F    Mary   7065\n",
       "     M    John   9655\n",
       "1881 F    Mary   6919\n",
       "     M    John   8769\n",
       "1882 F    Mary   8148\n",
       "...        ...    ...\n",
       "2014 M    Noah  19263\n",
       "2015 F    Emma  20415\n",
       "     M    Noah  19594\n",
       "2016 F    Emma  19414\n",
       "     M    Noah  19015\n",
       "\n",
       "[274 rows x 2 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The most popular name is simply the first one that appears in the series\n",
    "def most_popular(series):\n",
    "    return series.iloc[0]\n",
    "\n",
    "baby_pop = baby.groupby(['Year', 'Sex']).agg(most_popular)\n",
    "baby_pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that grouping by multiple columns results in multiple labels for each row. This is called a \"multilevel index\" and is tricky to work with. The important thing to know is that `.loc` takes in a tuple for the row index instead of a single value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Emily'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "baby_pop.loc[(2000, 'F'), 'Name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But `.iloc` behaves the same as usual since it uses indices instead of labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1885</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>9128</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>John</td>\n",
       "      <td>8756</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1886</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>9889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>John</td>\n",
       "      <td>9026</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1887</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>9888</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Name  Count\n",
       "Year Sex             \n",
       "1885 F    Mary   9128\n",
       "     M    John   8756\n",
       "1886 F    Mary   9889\n",
       "     M    John   9026\n",
       "1887 F    Mary   9888"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "baby_pop.iloc[10:15, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pivoting\n",
    "<hr>\n",
    "\n",
    "**If you group by two columns, you can often use pivot to present your data in a more convenient format.** Using a pivot lets you use one set of grouped labels as the columns of the resulting table.\n",
    "\n",
    "To pivot, use the `pd.pivot_table()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Sex</th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1880</th>\n",
       "      <td>Mary</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1881</th>\n",
       "      <td>Mary</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1882</th>\n",
       "      <td>Mary</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1883</th>\n",
       "      <td>Mary</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1884</th>\n",
       "      <td>Mary</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>Sophia</td>\n",
       "      <td>Jacob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>Sophia</td>\n",
       "      <td>Noah</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>Emma</td>\n",
       "      <td>Noah</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>Emma</td>\n",
       "      <td>Noah</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>Emma</td>\n",
       "      <td>Noah</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>137 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "Sex        F      M\n",
       "Year               \n",
       "1880    Mary   John\n",
       "1881    Mary   John\n",
       "1882    Mary   John\n",
       "1883    Mary   John\n",
       "1884    Mary   John\n",
       "...      ...    ...\n",
       "2012  Sophia  Jacob\n",
       "2013  Sophia   Noah\n",
       "2014    Emma   Noah\n",
       "2015    Emma   Noah\n",
       "2016    Emma   Noah\n",
       "\n",
       "[137 rows x 2 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.pivot_table(baby,\n",
    "               index='Year',         # Index for rows\n",
    "               columns='Sex',        # Columns\n",
    "               values='Name',        # Values in table\n",
    "               aggfunc=most_popular) # Aggregation function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare this result to the `baby_pop` table that we computed using `.groupby()`. We can see that the `Sex` index in `baby_pop` became the columns of the pivot table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Year</th>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">1880</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>7065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>John</td>\n",
       "      <td>9655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1881</th>\n",
       "      <th>F</th>\n",
       "      <td>Mary</td>\n",
       "      <td>6919</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <th>M</th>\n",
       "      <td>Noah</td>\n",
       "      <td>19594</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2016</th>\n",
       "      <th>F</th>\n",
       "      <td>Emma</td>\n",
       "      <td>19414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>Noah</td>\n",
       "      <td>19015</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>274 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          Name  Count\n",
       "Year Sex             \n",
       "1880 F    Mary   7065\n",
       "     M    John   9655\n",
       "1881 F    Mary   6919\n",
       "...        ...    ...\n",
       "2015 M    Noah  19594\n",
       "2016 F    Emma  19414\n",
       "     M    Noah  19015\n",
       "\n",
       "[274 rows x 2 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "baby_pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "<hr>\n",
    "\n",
    "We now have the most popular baby names for each sex and year in our dataset and learned to express the following operations in `pandas`:\n",
    "\n",
    "| Operation | `pandas` |\n",
    "| --------- | -------  |\n",
    "| Group | `df.groupby(label)` |\n",
    "| Group by multiple columns | `df.groupby([label1, label2])` |\n",
    "| Group and aggregate | `df.groupby(label).agg(func)` |\n",
    "| Pivot | `pd.pivot_table()` |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}